A tractable framework for estimating and combining spectral source models for audio source separation
نویسندگان
چکیده
The underdetermined blind audio source separation (BSS) problem is often addressed in the time-frequency (TF) domain assuming that each TF point is modeled as an independent random variable with sparse distribution. On the other hand, methods based on structured spectral model, such as the Spectral Gaussian Scale Mixture Models (Spectral-GSMMs) or Spectral Nonnegative Matrix Factorization models, perform better because they exploit the statistical diversity of audio source spectrograms, thus allowing to go beyond the simple sparsity assumption. However, in the case of discrete state-based models, such as Spectral-GSMMs, learning the models from the mixture can be computationally very expensive. One of the main problem is that using a classical Expectation-Maximization procedure often leads to an exponential complexity with respect to the number of sources. In this paper, we propose a framework with a linear complexity to learn spectral source models (including discrete state-based models) from noisy source estimates. Moreover, this framework allows combining probabilistic models of di erent nature that can be seen as a sort of probabilistic fusion. We illustrate that methods based on this framework can signi cantly improve the BSS performance compared to the state-of-the-art approaches. Key-words: Blind source separation, multichannel audio, Gaussian mixture model, expectation-maximization algorithm, convolutive mixture This work was supported in part by the EU FET-Open project FP7-ICT-225913-SMALL and the OSEO, the French State agency for innovation, under the Quaero program. ∗ Institute of Electrical Engineering, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland. e-mail: simon.arberet@ep .ch † [email protected] ‡ [email protected] § [email protected] in ria -0 05 72 24 9, v er si on 1 1 M ar 2 01 1 Un cadre de Complexité Maîtrisable pour Estimer et Combiner des Modèles Spectraux pour la Séparation de Sources Audio Résumé : La séparation aveugle de sources audio (SAS) est souvent traitée dans le plan temps-fréquence (TF), en partant de l'hypothèse que chaque point TF est la réalisation d'une variable aléatoire indépendante ayant une distribution parcimonieuse. D'autre part, les méthodes fondées sur un modèle spectral, telles que les modèles de mélanges de gaussiennes spectraux (MMG-Spectraux) ou les modèles de factorisation en matrices non-négatives spectraux (FMNspectraux), obtiennent de meilleurs résultats parce qu'ils exploitent la diversité statistique des spectrogrammes des sources audio, permettant ainsi d'aller au-delà de la simple hypothèse de parcimonie. Cependant, dans le cas des modèles à états discrets, tels que les MMG-Spectraux, l'apprentissage à partire du mélange peut être d'une complexité rédhibitoire. Un des problèmes majeurs est que l'utilisation de la procédure Espérance-Maximisation (EM) aboutit à une complexité calculatoire exponentielle par rapport au nombre de sources. Dans cet article, nous proposons un cadre, d'une complexité calculatoire linéaire, pour apprendre des modèles de sources (y compris des modèles à états discrets) à partir d'estimations bruitées des sources. De plus, ce cadre permet de combiner des modèles probabilistes de di érentes natures et permet ainsi de faire une sorte de "fusion" probabiliste. Nous montrons que des méthodes construites à partir de ce cadre permettent d'améliorer les performances de SAS par rapport aux méthodes de l'état de l'art. Mots-clés : Séparation aveugle de sources, audio multicanal, modèles de mélanges de sources, Espérance-Maximisation, mélanges convolutifs in ria -0 05 72 24 9, v er si on 1 1 M ar 2 01 1 A Tractable Framework for Estimating Models for Source Separation 3
منابع مشابه
Blind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation
The underdetermined blind audio source separation problem is often addressed in the time-frequency domain by assuming that each time-frequency point is an independently distributed random variable. Other approaches which are not blind assume a more structured model, like the Spectral Gaussian Mixture Models (Spectral-GMMs), thus exploiting statistical diversity of audio sources in the separatio...
متن کاملA General Framework for Online Audio Source Separation
We consider the problem of online audio source separation. Existing algorithms adopt either a sliding block approach or a stochastic gradient approach, which is faster but less accurate. Also, they rely either on spatial cues or on spectral cues and cannot separate certain mixtures. In this paper, we design a general online audio source separation framework that combines both approaches and bot...
متن کاملBlind Audio Source Separation using Short+Long Term AR Source Models and Iterative Itakura-Saito Distance Minimization
Blind audio source separation (BASS) arises in a number of applications in speech and music processing such as speech enhancement, speaker diarization, automated music transcription etc. Generally, BASS methods consider multichannel signal capture. The single microphone case is the most difficult underdetermined case, but it often arises in practice. In the approach considered here, the main so...
متن کاملSingle-Channel Mixture Decomposition Using Bayesian Harmonic Models
We consider the source separation problem for single-channel music signals. After a brief review of existing methods, we focus on decomposing a mixture into components made of harmonic sinusoidal partials. We address this problem in the Bayesian framework by building a probabilistic model of the mixture combining generic priors for harmonicity, spectral envelope, note duration and continuity. E...
متن کاملA General Modular Framework for Audio Source Separation
Most of audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this paper we introduce a general modular audio source separation framework based on a library of flexible source models that enable the incorporation of prior knowledge about the characteristics of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Signal Processing
دوره 92 شماره
صفحات -
تاریخ انتشار 2012